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Abstract 

Flexible models of object classes, based on linear combinations of prototypical images, are capable 
of matching novel images of the same class and have been shown to be a powerful tool to solve several 
fundamental vision tasks such as recognition, synthesis and correspondence. The key problem in creating 
a specific flexible model is the computation of pixelwise correspondence between the prototypes, a task 
done until now in a semiautomatic way. In this paper we describe an algorithm that automatically 
bootstraps the correspondence between the prototypes. The algorithm - which can be used for 2D 
images as well as for 3D models - is shown to synthesize successfully a flexible model of frontal face 
images and a flexible model of handwritten digits. 
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1 Introduction 


In recent papers we have introduced a new type 
of flexible model for images of objects of a certain 
class. The idea is to represent images of a certain 
type - for instance images of frontal faces - as 
the linear combination of prototype images and 
their affine deformations. This flexible model can 
be used as a generative model to synthesize novel 
images of the same class. It can also be used 
to analyze novel images by estimating the model 
parameters via an optimization procedure. Once 
estimated the model can be used for indexing, for 
recognition, for image compression and for image 
correspondence. 

At the very heart of our flexible models is an 
image representation in terms of which a linear 
combination of images makes sense. For a set of 
images to behave as vectors, they must be in pix- 
elwise correspondence (see [Poggio and Beymer, 
1996]). Our model uses pixelwise correspondence 
between example images and should not be con¬ 
fused with techniques which use linear combina¬ 
tions of images such as the so-called eigenfaces 
technique ([Kirby and Sirovich, 1990]; [Turk and 
Pentland, 1991]). In our approach, the correspon¬ 
dences between a reference image and the other 
example images are obtained in a preprocessing 
phase. Once the correspondences are computed, 
an image is represented as a shape vector and a 
texture vector. The shape vector specifies how the 
2D shape of the example differs from a reference 
image and corresponds to the flow held between 
the two images. Analogously, the texture vector 
specifies how the texture differs from the reference 
texture. Here we are using the term “texture” to 
mean simply the pixel intensities (grey level or 
color values) of the image. Our flexible model for 
an object class is then a linear combination of the 
example shape and texture vectors. 

The flexible model has been used for many syn¬ 
thesis and analysis tasks, mentioned later. 


1.1 A key problem: creating the model 
from prototypes 

The distinguishing aspect of our linear flexible 
models is that they are linear combinations of 
prototype shape and texture vectors and not of 
images ([Beymer and Poggio, 1996]). The pro¬ 
totypical images must be vectorized first, that is 
correspondence must be computed among them. 

This is a key step and in general a difficult 
one. It needs to be done only once at the stage 
of developing the model. At run-time no fur¬ 
ther correspondence is needed - and in fact the 
model can be used to compute correspondence if 
necessary. In our past papers we computed cor¬ 
respondence between the prototypes with auto¬ 
matic techniques such as optical flow. Sometimes, 
however, we were forced to use interactive tech¬ 
niques requiring the user to specify at least some 
of the correspondences (see for instance [Lines, 
1996]). An automatic technique that could set 
prototypes in correspondence would be therefore 
desirable even if very slow. In addition, any claim 
of biological plausibility would require the demon¬ 
stration of such a technique. 

In this paper we describe a bootstrapping tech¬ 
nique that seems capable of computing correspon¬ 
dence between prototypical images in cases in 
which standard optical flow algorithms fail. 

1.2 Past work 

The “linear class” idea of [Poggio and Vetter, 

1992] and [Vetter and Poggio, 1995] together with 
the image representation used by [Beymer et al ., 

1993] (see [Beymer and Poggio, 1996] for a re¬ 
view) is the main motivation behind the work of 
this and previous papers. Poggio and Vetter in¬ 
troduced the idea of linear combinations of views 
to define and model classes of objects, trying to 
extend the results of [Ullman and Basri, 1991] and 
[Shashua, 1992] who showed that linear combina¬ 
tions of three views of a single object may be used 
to obtain any other views of the object (barring 
self-occlusion and assuming orthographic projec¬ 
tion). Poggio and Vetter defined a linear object 
class as a set of 2D views of objects which cluster 
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in a small linear subspace of lZ 2n where n is the 
number of feature points on each object. They 
used the model mainly for synthesis tasks. In 
particular, for linear object classes, affine trans¬ 
formations can be learned exactly from a small 
set of examples and used to generate new, virtual 
views. For instance, it is possible to learn to gen¬ 
erate a new virtual view of an object from a single 
example view, represented as a 2D shape vector, 
provided appropriate prototypical views of other 
objects of the same class are available (under or¬ 
thographic projection). Thus, new views of a spe¬ 
cific face with a different pose or expression can 
be estimated and synthesized from a single view. 
In a very similar way, 3D structure can be esti¬ 
mated from a single image if the image and the 
structure of a sufficient number of prototypical 
objects of the same class are available. More gen¬ 
erally, flexible models of this type can underly 
learning of visual tasks in a top-down way, spe¬ 
cific to object classes (Jones, Sinha, Vetter and 
Poggio, in preparation). 

The problem of using the flexible model to an¬ 
alyze novel images was the main concern of Jones 
and Poggio ([Jones and Poggio, 1995, Jones and 
Poggio, 1996]). They introduced a novel approach 
to match flexible linear models to novel images 
that can be used for several visual analysis tasks, 
including recognition, image correspondence and 
image compression. 

Recently we have become aware of several pa¬ 
pers dealing with various forms of the idea of lin¬ 
ear combination of prototypical images. Choi et. 
al. (1991) were perhaps the first (see also [Pog¬ 
gio and Brunelli, 1992]) to suggest a model which 
represented face images with separate shape and 
texture components, using a 3D model to pro¬ 
vide correspondences between example face im¬ 
ages. The work of Taylor and coworkers et. al. 
([Cootes and Taylor, 1992]; [Cootes and Taylor, 
1994]; [Cootes et al ., 1992]; [Cootes et al ., 1994]; 
[Cootes et al ., 1993]; [Hill et al ., 1992]; [Lanitis 
et al., 1995]) on active shape models is probably 
the closest to ours. Many other flexible models 
have been proposed, such as the model of Blake 
and Issard [Blake and Isard, 1994]. 


2 Linear models 

In this section we formally specify the linear ob¬ 
ject class model and describe the matching algo¬ 
rithm used to analyze a novel image in terms of 
a flexible model. 

2.1 Formal specification 

To write the linear object class model mathe¬ 
matically, we must first introduce some notation, 
which we summarize from [Jones and Poggio, 
1996]. An image I is viewed as a mapping 

/ :U 2 I 

such that I(x,y) is the intensity value of point 
(x,y) in the image. Here we are only consider¬ 
ing grey level images. To define a model, a set of 
example images called prototypes are given. We 
denote these prototypes as Iq, I \,..., /jy. Let Iq 
be the reference image. The pixelwise correspon¬ 
dences between Iq and each example image are 
denoted by a mapping 

Sj : 7Z 2 —► 7Z 2 

which maps the points of Iq onto Ij, i.e. 
Sj{x,y) = ( x,y ) where ( x,y ) is the point in Ij 
which corresponds to ( x,y ) in Iq. We refer to Sj 
as a correspondence field and interchangeably as 
the shape vector for the vectorized Ij. We define 
Ij o Sj{x,y) = Ij(Sj{x,y)). We also define 

Tj(x,y) = I] o Sj(x,y). (1) 

Tj is the warping of image Ij onto the reference 
image Iq. So, {Tj} is the set of shape-free pro¬ 
totype images, that is the texture vectors. They 
are shape free in the sense that their shape is the 
same as the shape of the reference image. 

Using this notation, we are now ready to spec¬ 
ify the model. We define the ffexible model as 
the set of images I model 5 parameterized by b = 
[bo, h ,..., b N ], c = [c 0 , ci, ..., cjv] such that 

N N 

I model o(£c i S i ) = '£b j T j . (2) 

«=0 j =0 
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The summation YliLo c i$i constrains the shape of 
every model image to be a linear combination of 
the prototype shapes. Similarly, the summation 
J2f =0 bjTj constrains the texture of every model 
image to be a linear combination of the prototype 
textures. Note that the coefficients for the shape 
and texture parts of the model are independent. 
This adds greater expressiveness to the model as 
it allows the shape of one prototype to be used 
along with the texture of another, for example. 

To increase the flexibility of the model to han¬ 
dle translations, rotations, scaling and shearing, 
we add an affine transformation. The equation 
for the model images can now be written 

N N 

r"“»(.4»y Cl s.) = yt,r, (3) 

t=0 j =0 

where A : 1Z 2 1Z 2 is an affine transformation 

parametrized by p. Furthermore, we constrain 
YliLo c i = 1 in order to avoid redundancy in the 
parameters since the affine parameters allow for 
changes in scale. In the case of texture, the 6j’s 
are not constrained to sum to 1. 

Given values for c, b and p, the model im¬ 
age can be rendered by computing (x,y) = A o 
J2fLoC t S t (x,y) and g = Y,f=objTj(x,y) for each 
(x, y) in the reference image. Then the (x, y) pixel 
is rendered by assigning I model (x,y) = g , that is 
by warping the texture into the model shape. 

2.2 Analysis by model matching 

In the framework of this model, we can associate 
to each image in a class a shape vector and a tex¬ 
ture vector. We refer to the process of analyzing 
an image in terms of its shape and texture vector 
as vectorizing an image. 

A novel image of an object in a particular class 
is vectorized by matching a model of that class 
to the novel image. Matching means finding the 
best coefficients of the model so that the rendered 
model image most closely resembles the novel im¬ 
age. The general strategy for matching is to de¬ 
fine an error function between the novel image 
and the current guess for the closest model im¬ 
age. This error is then minimized with respect 


to the model parameters (c 4 -, bj, and pi ) by using 
a stochastic gradient descent algorithm. Follow¬ 
ing this strategy, we define the sum of squared 
differences error 

E( C, b, p) = W [I n0Vel (^ y) - I model (x, y)f 

Z x,y 

(4) 

where the sum is over all pixels ( x,y ) in the im¬ 
ages, I novel is the novel grey level image being 
matched and J model is the current guess for the 
model grey level image. From equation 3 we see 
that in order to compute I model we either have to 
invert the shape transformation (A o or 

work in the coordinate system of the reference im¬ 
age. It is computationally more efficient to work 
in the coordinate system of the reference image. 
To do this we simply apply the shape transfor¬ 
mation (given some estimated values for c and p) 
to both I novel and J model . From equation 3, and 
with the notation 

N 

S = (Ao^aSi). (5) 

8 = 0 

we obtain the following error function (if we chose 
the L 2 norm) 

1 _ N 

E( c, b, p) = - J2[I n ° vel o S(x,y)-J2 bjT, (x, y )} 2 . 
%,y j =0 

. (6) 

Minimizing the error yields the model image 
which best fits the novel image with respect to 
the L 2 norm. So far we have used the L 2 norm 
for convenience but other norms may be more ap¬ 
propriate (e.g. robust statistics). 

In order to minimize the error function any 
minimization algorithm could be used. We have 
chosen to use the stochastic gradient descent al¬ 
gorithm [Viola, 1995] because it is fast and can 
escape from local minima. 

2.3 Optical Flow 

For some prototypes, the pixelwise correspon¬ 
dences from the reference image to the prototype 


3 



can be found accurately by an optical flow algo¬ 
rithm. We have mostly used the multiresolution, 
laplacian-based, optical flow algorithm described 
in [Bergen and Hingorani, 1990]. 



Figure 1: Given the flexible model provided by 
the combination of image 1 and image 2 (in corre¬ 
spondence), the goal is to find the correspondence 
between image 1 (or image 2) and the novel image 
3. Our solution is to first find the linear combi¬ 
nation of image 1 and image 2 that is closest to 
image 3 (this is image 1’) and then find the corre¬ 
spondences from image 1’ to image 3 using optical 
flow. The two flow fields can then be composed 
to yield the desired flow from image 1 to image 3. 


3 Bootstrapping the synthesis 
of a flexible model 

Suppose that we have a flexible model consisting 
of N protoypes in correspondence, ft is tempting 
to try to use it to compute the correspondence 
to a novel image of an object of the same class so 
that it can be added to the set of prototypes. The 
obvious flaw in this strategy is that if the flexible 
model can compute good correspondence to the 
new image then there is no need to add it to the 
flexible model since it will not increase its expres¬ 
sive power. If it can’t, then the new prototype 
cannot be incorporated as such. A possible way 
out of this conundrum is to bootstrap the flexible 
model by using it together with an optical flow 
algorithm. 


3.1 The basic recursive step: improv¬ 
ing the flexible model with optical 
flow 

Suppose that an existing flexible model is not 
powerful enough to match a new image and 
thereby find correspondence with it. The idea 
is first to find rough correspondences to the novel 
image using the (inadequate) flexible model ob¬ 
ject class and then to improve these correspon¬ 
dences by using an optical flow algorithm. This 
idea is illustrated in figure 1. In the figure, a 
model consisting of (vectorized) image 1 and im¬ 
age 2 (and the pixelwise correspondences between 
them) is first fit to image 3. Call the best fitting 
linear combination of images 1 and 2 image 1’. 
The correspondences are then improved by run¬ 
ning an optical flow algorithm between the inter¬ 
mediate image 1’ and image 3. Notice that this 
technique can be regarded as a class specific reg¬ 
ularization of optical flow, which constrains ap¬ 
propriately the correspondence. 

3.1.1 Example 

An example of our basic step is shown in figure 2. 
In this figure, an optical flow algorithm is used to 
find the correspondences from image (a) to image 
(b). The resulting correspondences are not very 
good as shown by image (c) which is the backward 
warp of image (b) according to the correspon¬ 
dences found by optical flow. Image (c) should 
have the texture of image (b) and the shape of 
image (a). A better way to find the correspon¬ 
dences to image (b) is to first fit a model of faces 
to image (b), by using as a model a 20 prototype 
face images (with known correspondences). The 
model was matched to image (b) as described in 
section 2.2. The resulting best match is shown as 
image (d). Next, optical flow was run between im¬ 
age (d) and image (b) to further improve the cor¬ 
respondences found by the matching algorithm. 
The two correspondence fields were combined to 
get the correspondences from image (a) to image 
(b). Image (e) is the backward warp of image (b) 
according to the final correspondence. A compar¬ 
ison of image (c) with image (e) shows that better 
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Figure 2: This figure shows the basic idea behind bootstrapping. Image (a) is the reference face. Image 
(b) is a prototype. Image (c) is the image resulting from backward warping the prototype onto the 
reference face using the correspondences found by an optical flow algorithm. Image (d) is the model 
image which best matches the prototype using a model consisting of 20 prototypical faces (which did 
not include image (b)). Image (e) is the image resulting from backward warping the prototype onto the 
reference face using the flow field which was composed from matching the face model and then running 
an optical flow algorithm between image (d) and image (b) to further improve the correspondences. 
This is the basic step of the bootstrapping algorithm. 


correspondences are found by our basic recursive 
step relative to just using optical flow. 

3.2 A bootstrapping algorithm for cre¬ 
ating a flexible model 

The idea of bootstrapping is to start from a small 
flexible model consisting of just 2 prototypical im¬ 
ages and to increase its size (and representation 
power) by iterating the recursive step described 
above, progressively adding new images by set¬ 
ting them in correspondence with the model. 

There are two main problems with building a 
linear flexible model. The first one is to choose 
the reference image, relative to which shape and 
texture vectors are represented. The second is to 
automatically compute the correspondences even 
in cases in which optical flow fails. 


In principle, any example image could be used 
as the reference image. However, small peculiari¬ 
ties in an image can influence the matching pro¬ 
cess strongly. Thus, an image which is close to all 
images is more reliable, since the computation of 
the correspondence is more stable for small dis¬ 
tortions than for bigger ones. The average image 
of the whole data set, for which the average dis¬ 
tance to the whole data set is by definition at min¬ 
imum, is the optimal reference image. Since the 
correspondences between the images cannot be 
computed correctly in one step, the average has 
to be computed in an iterative procedure. Start¬ 
ing from an arbitrary image as the preliminary 
reference, a (noisy) correspondence between all 
other images and this reference is first computed 
using an optical flow algorithm. On the basis of 
these correspondences an average image can be 
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computed, which now serves as a new reference 
image. This procedure of computing the corre¬ 
spondences and calculating a new average image 
is repeated until a stable average (vectorized) im¬ 
age is obtained. 

The correspondence fields obtained through 
the optical flow algorithm from this final aver¬ 
age image to all the examples are usually far from 
perfect. The bootstrapping idea is to improve the 
correspondences by applying iteratively the basic 
step described above while also increasing the ex¬ 
pressive power of the flexible model. We could 
incorporate into the flexible model one new image 
at each timestep. Instead, we have implemented 
an equivalent algorithm in which the first step 
is to form a linear object model from the corre¬ 
spondences obtained from all images with optical 
flow. Since some of these correspondence fields 
are not correct and all are noisy, this algorithm 
uses only the most significant fields as provided 
by a standard PCA decomposition of the shape 
and the texture vectors. Instead of adding new 
images, the algorithm increases with successive 
iterations the number of principal components, 
ordered according to the associated eigenvalues 
(the allowed range of parameters of the selected 
principal components can also be increased with a 
similar effect). At each iteration a flexible model 
is selected and used to match each image. The op¬ 
tical flow algorithm estimates correspondence be¬ 
tween the image and the approximation provided 
by the flexible model. This held is then added to 
the correspondence held implied by the matched 
model, giving a new correspondence held between 
the reference image and the example. The corre¬ 
spondence helds, obtained by this procedure, will 
finally lead to a new average image and also to 
new principal components which can be incorpo¬ 
rated in an improved hexible model. Iterating 
this procedure with increasing expressive power of 
the model (by increasing the number of principal 
components) leads to stable correspondence helds 
between the reference image and the examples. 
The number of iterations as well as the increas¬ 
ing complexity of the model can be regarded as 
regularization parameters of this bootstrapping 


process. 


3.2.1 Pseudo code of an efficient algo¬ 
rithm 

1A: Selecting a reference image. 

Select an arbitrary image A as reference image 
Iref- 

Until convergence do { 

For all Ii { 

Compute correspondence held Si between 
I re f and Ii using optical how. 

Backwards warp Ii onto I re f using Si 
to get the texture map 2}. 
end For} 

Compute average over all Si and 2} 

Forward warp T average using S 

average 

to Create laverage 

Convergence test: is I aV erage ~ Iref < limit ? 
Copy 

laverage fo Irefi 

end Until } 

IB: Computing the correspondence. 

Until number n of principal components used in 
the linear model is maximal { 

Perform a principal component analysis on Si 
and separately on 2}. 

Select the hrst n principal components for the 
linear model. 

Approximate each Ii by the linear model 
with ig odel . 

Compute correspondence held S[ between 
igodel anc [ j. us i n g optical how 
Combine S[ and S™ odel to Sf ew 
Backwards warp Ii onto I re f using Sf ew 
to get the texture map 2}. 

Copy all Sf™ to Si. 

Increase number n of principal components 
used 

in the linear model, 
end Until } 
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4 Results 

The method described in the previous sections 
was tested on two different classes of images. One 
class was frontal views of human faces and the 
second was handwritten digits. 

4.1 Face images 

4.1.1 Data set 

130 frontal images of Caucasian faces were used in 
our experiments. The images were originally ren¬ 
dered for psychophysical experiments [Troje and 
Biilthoff, 1995] under ambient illumination con¬ 
ditions from a data base of three-dimensional hu¬ 
man head models recorded with a laser scanner 
(Cyberware™). All faces were without makeup, 
accessories, and facial hair. Additionally, the 
head hair was removed digitally (but with manual 
editing), via a vertical cut behind the ears. The 
resolution of the grey-level images was 256-by-256 
pixels and 8 bit. 

Preprocessing: First the faces were seg¬ 

mented from the background and aligned roughly 
by automatically adjusting them to their two- 
dimensional centroid. The centroid was com¬ 
puted by evaluating separately the average of all 
x, y coordinates of the image pixels related to the 
face independent of their intensity value. 

4.1.2 Evaluation 

The method described in the previous sections 
was successfully applied to all face images avail¬ 
able. 

The step involving synthesis of the reference 
(average) image was tested for each image as a 
starting image in the algorithm. As a convergence 
criteria we used a theshold on the minimum av¬ 
erage change of the pixel gray value (0.3, whereas 
the range was 256). The threshold was reached 
in every case within 5 iterations and mostly after 
3. The final reference images could not be dis¬ 
tinguished under visual inspection. One of these 
reference images is shown in the second column 


of figure 3; the same reference image was used for 
the final correspondence finding procedure. 

Optical flow yields the correct correspondence 
between the reference image and each example 
image only in 80% of all cases. In the remain¬ 
ing cases the correspondence is partly incorrect, 
as shown in figure 3. The center column shows 
the images which result from backward warping 
the face images (left column) onto the reference 
image using the correspondence fields obtained 
through the optical flow algorithm. In the first 
iteration of the correspondence finding procedure 
the first 2 principal components of the shape vec¬ 
tors (that is of the correspondence fields) and 
of the textures vectors are used in the flexible 
model. Then the correspondence held provided 
by matching with the flexible model is combined 
with the correspondence held obtained by the op¬ 
tical how algorithm between the face image and 
its hexible model approximation. The backward 
warps using this correspondence helds are shown 
in the fourth column. The correspondence helds 
were iterated by slowly increasing the number of 
principal components used in the hexible model. 
After four iterations with 2, 10, 30 and 80 prin¬ 
cipal components, the correspondence helds be¬ 
tween the reference face and all example images 
did not reveal any obvious errors (right column). 

In a second experiment the same data set was 
split in a set of 100 training images and in a set 
of 30 test images. This experiment was used to 
document the improvements and changes within 
the linear model during the bootstrapping. The 
bootstrap algorithm was run on the training set as 
described earlier. After each bootstrapping step 
the L 2 norm between the reference texture vector 
and the texture vector of each example image was 
computed. Interestingly, the L 2 norm does not 
always rehect the improvement in correspondence 
as observed by visual inspection. The L 2 norm 
may even increase slightly during the process. 

After each bootstrapping step the 30 test im¬ 
ages were approximated by the linear object 
model formed by the first 80 principal compo¬ 
nents obtained on the training set of 100 images. 
The Mahalanobis distance of each approximation 
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Figure 3: Five of the most difficult faces in our data set. The correspondence between face images 
(left column) and a reference face can be visualized by backward warping of the face images onto the 
reference image (three columns on the right). The correspondence obtained through the optical flow 
algorithm does not allow a correct mapping (center column). The first iteration with a linear flexible 
model consisting of two principal components already yields a significant improvement (top row). After 
four iterations with 10, 30 and 80 components, respectively, all correspondences were correct (right 
column) 
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Figure 4: For each of the 10 digits the figure shows 
the first five shape eigenvectors (left to right) of 
the model (obtained from 250 prototypical dig¬ 
its). Each column display how each shape eigen¬ 
vector changes relative to the average digit (in 
dashed box). The range of the coefficient ranges 
from +5 (top) to -5 standard deviations (bottom) 
of each eigenvector. 


to the average face image was computed. Dur¬ 
ing the bootstrapping procedure the Mahalanobis 
distance decreased in the average over all test im¬ 
ages. 


4.2 Digits 

4.2.1 Data set and Preprocessing 

The images used in these experiments were from 
the IIS postal service database (262 per each of 
the 10 digits). The original resolution of 16-by- 
16 pixels was increased to 32-by-32 pixels and the 
images were blurred with a Gaussian 5-by-5 ker¬ 
nel. 


4.2.2 Evaluation 

The bootstrapping algorithm was used for all 10 
digits without modification. For each digit we 
obtained a linear model from the first 250 dig¬ 
its in the dataset. The reference image (average 
shape) is shown in the dashed boxes in figure 4. 
After computing the reference image and the ini¬ 
tial correspondence fields with optical flow new 
correspondence fields were obtained using 4 itera¬ 
tions of the bootstrapping algorithm. During the 

4 iterations the number of principal components 
used in the algorithm was increased from 2 to 10, 
30 and 80, respectively. Figure 4 shows the first 

5 principal shape components of the final linear 
model. 

The models obtained by the bootstrapping al¬ 
gorithm were used to match new digits which 
where not part of the training set. In figure 5 
ten new images of the digit 3 are approximated 
with three different models of digits. Clearly the 
“3” model approximates well each of the new 
"3". whereas the “5” and the “2” models provide 
very poor approximations. These results suggest 
that the digit models obtained with bootstrap¬ 
ping could be used successfully for recognition as 
well as for image compression. 




















Figure 5: 10 examples of the digit 3 are approxi¬ 
mated by 3 different linear models: in A a model 
for “3”, in B for “2’s” and in C for “5’s”. In 
each case the top row shows the target “3’s”, the 
center row shows the optimal approximation by 
the model and the third row shows the difference 
between the top and center row. Each model, ob¬ 
tained automatically by the bootstrapping proce¬ 
dure from 250 prototypes, consisted of the first 20 
shape principal components and the first texture 
component. 


5 Conclusions 

The bootstrapping algorithm we described is not 
a full answer to the problem of computing corre¬ 
spondence between prototypes. It provides how¬ 
ever ail initial and promising solution to the very 
difficult problem of automatic synthesis of the 
flexible models from a set of prototypical exam¬ 
ples. Notice that we have used multiresolution 
optical flow as one part of our bootstrapping al¬ 
gorithm. In principle other matching techniques 
could be used within our bootstrapping scheme. 
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